Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise
نویسندگان
چکیده
The growing importance of massive datasets with the advent of deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling for large datasets, non-expert labeling, and label corruption by data poisoning adversaries. In the latter case, corruptions may be arbitrarily bad, even so bad that a classifier predicts the wrong labels with high confidence. To protect against such sources of noise, we leverage the fact that a small set of clean labels is often easy to procure. We demonstrate that robustness to label noise up to severe strengths can be achieved by using a set of trusted data with clean labels, and propose a loss correction that utilizes trusted examples in a dataefficient manner to mitigate the effects of label noise on deep neural network classifiers. Across vision and natural language processing tasks, we experiment with various label noises at several strengths, and show that our method significantly outperforms existing methods.
منابع مشابه
MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels
Recent studies have discovered that deep networks are capable of memorizing the entire data even when the labels are completely random. Since deep models are trained on big data where labels are often noisy, the ability to overfit noise can lead to poor performance. To overcome the overfitting on corrupted training data, we propose a novel technique to regularize deep networks in the data dimen...
متن کاملHow Do Neural Networks Overcome Label Noise?
This work provides an analytical expression for the effect of label noise on the performance of deep neural networks. (a) 5 of MNIST’s 10 classes, with clean labels (b) 20% Random Noise, 100% Network Prediction Accuracy (c) 20% Randomly Spread Flip Noise, 100% Accuracy (d) 20% Locally Concentrated Noise, 80% Accuracy Figure 1: Different types of random label noise. DNNs are extremely resistant ...
متن کاملCredit Risk Measurement of Trusted Customers Using Logistic Regression and Neural Networks
The issue of credit risk and deferred bank claims is one of the sensitive issues of banking industry, which can be considered as the main cause of bank failures. In recent years, the economic slowdown accompanied by inflation in Iran has led to an increase in deferred bank claims that could put the country's banking system in serious trouble. Accordingly, the current paper presents a prediction...
متن کاملAdaptive Filtering Strategy to Remove Noise from ECG Signals Using Wavelet Transform and Deep Learning
Introduction: Electrocardiogram (ECG) is a method to measure the electrical activity of the heart which is performed by placing electrodes on the surface of the body. Physicians use observation tools to detect and diagnose heart diseases, the same is performed on ECG signals by cardiologists. In particular, heart diseases are recognized by examining the graphic representation of heart signals w...
متن کاملروشی جدید برای عضویتدهی به دادهها و شناسایی نوفه و دادههای پرت با استفاده از ماشین بردار پشتیبان فازی
Support Vector Machine (SVM) is one of the important classification techniques, has been recently attracted by many of the researchers. However, there are some limitations for this approach. Determining the hyperplane that distinguishes classes with the maximum margin and calculating the position of each point (train data) in SVM linear classifier can be interpreted as computing a data membersh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.05300 شماره
صفحات -
تاریخ انتشار 2018